Quantum state discrimination bounds 
for finite sample size 



Koenraad M.R. Audenaert^^\ Milan Mosonyi 2,3 b)^ Frank Verstraete^ 



(N 
O 



X 



^ Mathematics Department, Royal Holloway, University of London 
Egham TW20 OEX, United Kingdom 



In the problem of quantum state discrimination, one has to determine by mea- 



^ School of Mathematics, University of Bristol 
(N University Walk, Bristol, BS8 ITW, United Kingdom 

>. 

Q ^ Mathematical Institute, Budapest University of Technology and Economics 

)7 j Egry Jozsef u 1., Budapest, 1111 Hungary 

^ ^ Fakultat fiir Physik, Universitat Wien 
Boltzmanngasse 5, A-1090 Wien, Austria 

+^ Abstract 

Q-( surements the state of a quantum system, based on the a priori side information 

that the true state is one of two given and completely known states, p or a. In 
general, it is not possible to decide the identity of the true state with certainty, 
^ and the optimal measurement strategy depends on whether the two possible errors 

(mistaking p for a, or the other way around) are treated as of equal importance 
f — or not. Results on the quantum Chernoff and Hoeffding bounds and the quantum 

Stein's lemma show that, if several copies of the system are available then the 
optimal error probabilities decay exponentially in the number of copies, and the 
^ decay rate is given by a certain statistical distance between p and a (the Chernoff 

I distance, the Hoeffding distances, and the relative entropy, respectively). While 

these results provide a complete solution to the asymptotic problem, they are not 
completely satisfying from a practical point of view. Indeed, in realistic scenarios 
one has access only to finitely many copies of a system, and therefore it is desir- 
ed able to have bounds on the error probabilities for finite sample size. In this paper 
we provide finite-size bounds on the so-called Stein errors, the Chernoff errors, 
the Hoeffding errors and the mixed error probabilities related to the Chernoff and 
the Hoeffding errors. 
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1 Introduction 



Assume we have a quantum system with a finite-dimensional Hilbcrt space "H, and we 
know that the system has been prepared either in state p (this is our null hypothesis Hq) 
or state cr (this is our alternative hypthesis Hi). (By a state we mean a density operator, 
i.e., a positive semi-definite operator with trace 1). The goal of state discrimination 
is to come up with a "good" guess for the true state, based on measurements on the 
system. By "good" we mean that some error probability is minimal; we will specify this 
later. We will study the asymptotic scenario, where we assume that several identical 
and independent (i.i.d.) copies of the system are available, and we are allowed to make 
arbitrary collective measurements on the system. Due to the i.i.d. assumption, i.e., that 
the copies are identical and independent, the joint state of the n-copy system is either 
Pn '■= P®" or cr„ := a®'^ for every n G N. 

A test on n copies is an operator T G Bil-f^^), < T < 1^, that determines the 
binary POVM (T, In — T). If the outcome corresponding to T occurs then we accept the 
null hypothesis to be true, otherwise we accept the alternative hypothesis. Of course, 
we might make an error by concluding that the true state is a when it is actually p 
[error of the first kind or type I error) or the other way around {error of the second kind 
or type II error). The probabilities of these errors when the measurement {T,!^ — T) 
was performed are given by 

an{T) Ty pn{In — T) (first kind) and /3n(T) := Tr UnT (second kind). 

Unless Pn and an are perfectly distinguishable (which is the case if and only if supp p„ _L 
supp(T„), the two error probabilities cannot be simultaneously eliminated, i.e., a„(T) -|- 
Pn{T) > for any test T, and our aim is to find a joint optimum of the two error 
probabilities, according to some criteria. 

In a Bayesian approach, one considers the scenario where p and a are prepared with 
some prior probabilities p and 1 — p, respectively; the natural quantities to consider in 
this case are the so-called Chernoff errors, given by min^ test{pcKn(2^) + (1 ~ p)l^n{T)}. 
More generally, consider for any A > the quantities 

e„,«,A min{KQ;„(r) + A^„(r)}. 

T test 

For a self-adjoint operator X and constant c e M, let {X > c} denote the spectral 
projection of X corresponding to the interval (c, -|-oo). We define {X > c}, {X < c} 
and {X < c} similarly. As one can easily see, 

- ^ + _ 1 II _ X II 

(where ||^||;^ := Tr |X| for any operator X), and the minimum is reached at any test T 
satisfying 

{npn - XcFn > 0} < T < {npn - Acr„ > 0}. 
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Such a test is called a Neyman- Pears on test or Holevo-Helstrdm test in the literature 
[T71 123] . By the above, such tests are optimal from the point of view of trade-off between 
the two error probabilities. Indeed, if T is a Neyman-Pearson test corresponding to k, 
and A then for any other test S we have 

Kan{T) + A/3„(T) < Kar,{S) + A/3„(^). 

In particular, if a„(S') < an(T) then necessarily f3n{S) > PniT) and vice versa, i.e., if S 
performs better than a Neymann-Pearson test for one of the error probabilities then it 
necessarily performs worse for the other. This is the so-called quantum Neyman-Pearson 
lemma. For later use, we introduce the notations 



Mn,a ■■= {T test : {e-">„ - a„ > 0} < T < {e-">„ - (t„ > 0}} 



and 



e„(a) := e„,e-™,i = min{e "''a„(T) + /3„(T)} 

T test 

= e-"'^a„(T) + /3„(T), T e Un,a, (2) 

where a G M is a parameter. 

The following has been shown for the i.i.d. case in [21 E2] (see also [201 1221 EZl I2H] 
for various generalizations to correlated settings). 

Theorem 1.1. For any /t, A > we have 

- lim -loge„.K.A = - lim -loge„(0) = C {p\\a) := - inf log Tr pV^"*, 

where C {p\\a) is called the Chernoff distance of p and a. 

Another natural way to optimize the two error probabilities is to put a constraint 
on one of them and optimize the other one under this constraint. It is usual to optimize 
the type II error under the constraint that the type I error is kept under a constant 
error bar e G (0, 1), in which case we are interested in the quantities 

(3n,e ■= mm{/3n{T) : T test, an{T) < e}. (3) 

Another natural choice is when an exponential constraint is imposed on the type I error, 
which gives 

:= min{/3„(T) : T test, «„(T) < e""''} (4) 

for some fixed parameter r > 0. Unlike for the quantities e„ k,a above, there are no 
explicit expressions known for the values of Pn,e and (3n,e-"'-, or for the tests achieving 
them. However, the asymptotic behaviours are known also in these cases. The asymp- 
totics of (3n,e is given by the quantum Stein's lemma, first proved for the i.i.d. case in 
[m |33] and later generalized to various correlated scenarios in [71 El [191 Ell 1221 1211 [2H] • 
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Theorem 1.2. We have 

- lim -log/3„,e= inf | - lim - log/3n(T„) : lim a„(T„) = I = 5 (p 1 1 a) , 

n-s>+oo n {Tnjnem { n-s-oo n n-5-oo J 

where the infimimum is taken over all sequences of measurements for which the indicated 
limit exists, and S {p\\a) is the relative entropy of p with respect to a. 

The asymptotics of /3„,e-"^ has been an open problem for a long time (see, e.g., 
[T3]), which was finally solved for the i.i.d. case in (TB] and [30] (apart from some minor 
technicalities that were treated both in |2] and [21]), based on the techniques developed 
in [2] and [32]. These results were later extended to various correlated settings in 

[211 EH EH Eg. 

Theorem 1.3. For any r > we have 

— hm — iogp^^e-"'- = J^r [p\\o') := — mi 



O^ 't.C '\/ / I 1 I 

n-s>oo n o<t<i I i — r 

where Hr{p\ \ cr) is the Hoeffding distance of p and a with parameter r. 



It is not too difficult to see that Theorem L3 can also be reformulated in the following 
way: 

Hr{p\\(T)= inf j- lim -log/3„(T„) : limsup - loga„(T„) < - 

where the infimum is taken over all possible sequences of tests for which the indicated 
limit exists (see [21] for details). This formulation makes it clear that the Hoeffding 
distance quantifies the trade-off between the two error probabilities in the sense that it 
gives the optimal exponential decay of the error of the second kind under the constraint 
that the error of the first kind decays with a given exponential speed. 

While there is no explicit expression known for the optimal tests minimizing ^ 
and (|4]), it is known that the Neyman- Pearson tests are asymptotically optimal for this 
problem in the sense given in Theorem [1.4 below. For a positive semidefinite operator 
X and X G M, let denote the spectral projection of X corresponding to the singleton 
{x}. For every t G M, we define X* := 'Yl,x>o^^Px'^ particular, X° denotes the 
projection onto the support of X, i.e., X° = {X > 0}. The following was given in [21j : 

Theorem 1.4. For any r > — logTrpcr^, let := Hr{p\\(j) — r. For any sequence of 
tests {Tn} satisfying G Afn,ar, n G N, we have 

- lim - loga„(T„) = 0{ar) = r, 

n— >oo 77, 

- lim - log/3„(T„) = - lim - loge„(a^) = v?(ar) = Hr{p\\(j) , 

n— !>oo n n— s>oo n 

where for every a G M, 

(y9(a) := max{at — logTrpV"^"*}, (p^a) := (f(a) — a. (5) 
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Theorems IJ^ 1^ give a complete solution to the asymptotic problem in the most 
generally considered setups. These results, however, rely on the assumption that one 
has access to an unlimited number of identical copies of the system in consideration, 
which of course is never satisfied in reality. Note also that the above results give no 
information about the error probabilities for finite sample size, which is the relevant 
question from a practical point of view. Our aim in this paper is to provide bounds on 
the finite-size error probabilities that can be more useful for applications. There are two 
similar but slightly different ways to do so; one is to consider the optimal type II errors 
for finite n; we treat this in Section |3} The other is to study the asymptotic behaviour 
of the error probabilities corresponding to the Holevo-Helstrom measurements, that are 
known to be asymptotically optimal; we provide bounds on these error probabilities in 
Section |4} In the special case where both hypotheses are classical binary probability 
measures, a direct computation yields bounds on the mixed error probabihties e„(a); 
we present this in the Appendix. Some of the technical background is summarized in 
Section [2] below. 



2 Preliminaries 

2.1 Renyi relative entropies and related measures 

For positive semidefinite operators A,B on a. Hilbert space /C, we define their Renyi 
relative entropy with parameter t G [0, +00) \ {1} as 



StiA\\B) 



^ log Tr A^B^-^ = jzi^A,B (t) , if t G [0, 1) or supp A < supp B, 
+00, otherwise. 



where 

^A,B (t) := log ZA,B{t), ZA,B{t) ■= Tr A'B^~\ t G M. 

Here we use the convention logO := —00 and 0* := 0, i.e., all powers are computed on 
the supports of A and B, respectively. In particular, St{A\\B) = +00 if and only if 
supp A ± supp 5 and t G [0, 1), or supp A ^ supp-B and t > 1. Note that ZA,Bif) is a 
quasi-entropy in the sense of [S^. For most of what follows, we fix A and 5, and hence 
we omit them from the subscripts, i.e., we use ip instead of 'ipA,B, etc. 

If p is a positive measure on some finite set X then it can be naturally identified 
with a positive function, which we will denote the same way, i.e., we have the iden- 
tity p{{x}) = p{x), X E X. Moreover, p can be naturally identified with a positive 
semidefinite operator on C'^ = /^(A*), which we again denote the same way; the matrix 
of this operator is given by {ex,pey) = 6x,yp{x), where {ex}x&x is the canonical basis 
of C'^. Given this identification, we can use the above definition to define the Renyi 
relative entropies of positive measures/functions p and q on some finite set X , and we 
get St{j)\\q) = log Ylix<^x p{xyq{xy~^ whenever 5'^ (j9 1 1 q) is finite. 
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Let A = J2iex ^ ~ '^j&J ^iQj decompositions of the positive semidefi- 

nite operators A and B such that {Pi} and {Qj} are sets of orthogonal projections and 
aj,6j > for all i and j. Let Xa,b '■= {(^j j) '■ T^T^PiQj > 0}, and define 

PA,B{iJ) ■= aiTiPiQj, qA,Bii,j) ■= bjTrPiQj, e Xa,b- (6) 

Then p = pa,b and q = qA,B are positive measures on A" = Xa^b, and we have 

i^A,Bit)=i^pAt), teR, and StiA\\B) = Stip\\q), te[0,l). 
It is easy to see that 

suppp = suppg = A:' and p{X) = TtAB°, g(A') = Tr 

Note that the decompositions of A and B are not unique, and hence neither are 
the set X and the measures p and q. However, if X,p,q are defined through some 
decompositions A = Xliex ^^'^ ^ ~ '^jej ^jQj then we will always assume that for 
every n G N, Xas^^^b^^iPa^'^jB^" 

and 

qA^" _B®" 

are defined through the decompositions 
^ Y.i^x'- CLiPi and B = ^iQjj where a, := a^, ■ . . . ■ a^^, Pj := Pj^ (g) . . . (g) Pj„, 

etc. In this way, we have 

■y -yn ^ ^ ^(g)n 

iXA'^"-,B®" — "^ABi PA'S"- ,B^" — PaB^ qA'S" ,B'S" — qAB' 

The above mapping of pairs of positive semi-definite operators to pairs of classical 
positive measures was used to prove the optimality of the quantum Chernoff bound 
in [32] and subsequently the optimality of the quantum Hoeffding bound in |30], by 
mapping the quantum state discrimination problem into a classical one. We will use 
the same approach to give lower bounds on the mixed error probabilities in Section |4j 
For given A, B, and every t e M, define a probability measure /x* on X as 

:= -^pihjYq{i,jY~\ e X, 

where X,p,q are given as above, and Z(t) = ZA,B{t) = Ylii jVih iYqih 3Y~^ i t eR. 

Lemma 2.1. The function ip is convex on M, it is affine if and only if g is a constant 
multiple of p and otherwise ip'\t) > for all t G M. 

Proof. A straightforward computation shows that 

^'(t) = Z{t)-' Y,p{t,jyqit,jY~\\ogp{t,j) - logg(z, j)) = E^' /, (7) 

id 

_ Tr A*pi-*(logA - logP)2 /Tr A*pi-*(logA - logP) 
~ Tryl*Pi-* V Tryl*Pi-* 
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where f{i,j) := logp{i,j) —\ogq{i, j), G X, and E^t denotes the expectation value 
with respect to /i*. This shows that ip is convex on the whole real line, and ip"{t) = 
for some t G M if and only if / is constant, which is equivalent to q being a constant 
multiple of p. Since this condition for a flat second derivative is independent of t, the 
assertion follows. □ 

For a condition for a fiat derivative of ip in terms of A and B, see Lemma 3.2 in [21j. 

Corollary 2.2. If Tr A < 1 then the function t St {A \ \ B) is monotone increasing 
on [0, 1) and on (1, +oo). If Tr A = 1 then we have 

r QfAwm QfAwm q t a w n\ /Tr A (log* A - log* 5), suppA<suppB, 
hm St {A\\ B) = Si [A \\B) := S [A \\B) := { . 

I +00, otherwise, 

where log* x = \ogx, x > 0, and log* := 0. 

Proof. We have f^St{A\\B) = = =_m + where is between 

t and 1. The first assertion then follows due to Lemma 12.11 If TrA = 1 then 
limt_^i St{A\\B) = ip'il), which is easily seen to be equal to S' (A || B). □ 

The quantity S* (A 1 1 5) defined above is the relative entropy of A with respect to B. 



The following Lemma complements Corollary 2.2 



Lemma 2.3. Assume that suppA < suppS. For any c > and |t — 1| < (5 : = 



"^in|2, 2i;|^|, where 

r]:=l + ei^3/2(A||B) ^ ^-is.^^iAWB) ^ (^q^ 
we have 

St{A\\B)> S {A \\B)- {4 cosh c){l-t) {log r]^, tG (1-5,1), (10) 
St{A\\B) < S {A \\B) + {A cosh c){t-l) {log 7]^, tG (1,1 + 5). (11) 

With the convention 5*1 (A || i?) := 5* (A || B), the above inequalities can be combined 
into 

S[^{A\\B) < St{A\\B) + (4 cosh c) (log 7]) 2 (/3 -t), 1-S<t<l<(3<l + S. 



Proof. The inequality ( 11 ) was first given for conditional entropies in |37] and for relative 
entropies of states in [3H]. Exactly the same proof yields (11) for general positive 
semidefinite operators, and also the inequality (10). □ 

For an operator X on a finite-dimensional Hilbert space, let \\X\\-^ := Tr |X| = 
Tt\/X*X denote its trace-norm. The von Neumann entropy of a positive semi-definite 
operator A is defined as S{A) := — Tr Alog A = —S {A\\I). The following is a sharp- 
ening of the Fannes inequality [13]; for a proof, see, e.g., ^ or 
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Lemma 2.4. For density operators A and i? on a finite-dimensional Hilbert space "H, 

\S{A) - S{B)\ <l\\A- log(dim-H - 1) + h^iU - B\\J2), 
where h2{x) := —x log x — {1 — x) log(l — x), x E [0, 1]. 

For positive semidefinite operators A and B, we define their Chernoff distance as 

C{A\\B):=- min logTrA*^^-* = sup {{1 - t)St {A\\ B)} . 
o<*<i 0<t<l 

The following inequality between the trace-norm and the Chernoff distance was given 
in Theorem 1 of [2]; see also the simplified proof by N. Ozawa in |25j . 

Lemma 2.5. Let A and B be positive semidefinite operators on a finite-dimensional 
Hilbert space T-L. Then 

-Tr{A + B) --Tt\A-B\< TtA^B^'*, t e [0,1], 

or equivalent ly, 

^Tr(A + 5)-^P-i?||,<e-^(^ll^). 

The above lemma was used to prove the achievability of the quantum Chernoff 
bound in [2], and subsequently the achievability of the quantum Hoeffding bound in 
[T6l . We will recall these results in Section 111 



The Hoeffding distance of A and B with parameter r > is defined as 
(A II B) sup |S, II B) - ^1 = =up -lo,Tr A'B^-' 

0<t<l I J- ~ * J 0<t<l i — t 



(cf. Theorem 1.3 for the same expression for density operators). For every a G M, let 



09(a) := max {to — iIjU)}, 0(a) := maxjft — l)a — i/jit)} = 09(a) — a. 
te[o,i] te[o,i] 

as in ([5]). 

Lemma 2.6. (i) The function r Hr{A\ \ B) is convex and monotonic decreasing. 

(ii) lim^^o 1 1 5) = Hq{A\\B), and if TrA = 1 then Hq{A\\B) = S{A\\B). 

(iii) For every —ip(X) < r < —ip{0) — "ip'iO) there exists a unique G (0, 1) such that 

r = S (/i*" Hp) = {tr-l)lp'{tr)-tlj{tr), Hr{A\\B) = S (/i*'' 1 1 g) = tr1p'{tr)-tlj{tr). 
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(iv) For every r > —ipi^) there is a unique a,. G M such that 



r. 



(12) 



Moreover, = H.,. {A\\B) — r, and if r < —ipi^) — "^'(0) then a^- = ip'{tr) with 
the tr given in |(iii) 



Proof. The first assertion is obvious from the definition, and the second identity in 
s immedic 

HriA\\B 



follows immediately from Corollary 2^ Note that 

-tr - ij{t) 



sup 

0<t<l 



1-t 



sup{— sr — ^{s)}, 



s>0 



where ijj{s) := (1 + s)ip (j^), and hence the function r h-)- Hr{A\\B) is essentially 
the Legendre transform of ip. By Proposition 4.1 and Corollary 4.1 in [12], ijj* is lower 
semicontinuous, and hence lim infr\o Hr{A\\B) > Ho{A\\B) > limr\o Hr {A\ \ B), 
where the second inequality is due to the monotonicity in r. This gives the first identity 
in 



Convexity of ip yields that ip{0) + i^'{0) < ^/'(l) and equality holds if and only if ijj 
is affine, in which case the assertion in (iii) is empty and hence for the rest we assume 
ip"{t) >0,tER. By the definition of ip, 



l + s 



+ 



l + s 



l + s 



and '^/'"(s) 



'l + s) 



l + s 



and hence ■?/' is also convex. Note that ip'ifS) = i^^O) + ip^O) and lims^+oo 'ip'is) = ipi^l), 
and hence. 



Hr{A\\B) = sup{-sr - tp{s)} 



s>0 



-^(0) = -V-lO), -r < V(0) + ^/''(0), 
+00 — r > ip{l). 



On the other hand, for any — '?/'(l) < r < — '?/'(0) — ^ip'^O) there exists a unique Sr > 
such that 

— r = Ip'i^r) = 'ip(tr) + {l—tr)'ip'itr) and Hr{A\\B) = —Srr—1p{Sr) = —1p(tr)+tr1p'itr), 

where tr = G (0, 1). The identities 

S{ij'\\p) ={t-l)^'{t)-^{t), S {fi'Wq) =tij'{t) -ij{t), teR, 



follow by a straightforward computation. This proves (iii) For — ^'(1) < r < —ip{0) — 
ip'{0), (iv) is an immediate consequence of (iii) For the general case, see e.g.. Theorem 
4.8 in [21]. □ 
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Remark 2.7. The equation of the tangent hne of ip at point t is l{x) := il){t) + 
(x — t)ip'{t). Hence, ip{t) — tip'it) = —S (/i* || g) is its intersection with the y axis and 
ipit) — (t — l)tp'(t) = —S (/i* \ \p) is its intersection with the x = 1 hne. 

Remark 2.8. Note that 

V'(O) = logTrA°5 = logg(A'), and if A° > 5° then ip'iO) = -S {B \\ A) / Ti B, 
V'(l) = logTrA5° = logp(A'), and if A° < 5° then ^p'{l) = S {A\\ B) / Ti A. 



Remark 2.9. It was shown in [23] that 

Hr{p\\q) = inf {S (/U 1 1 g) : S {fi\\p) < r}, 
where p and q are probabihty distributions on some finite set X, and /i*'' with the t,. 



given in Lemma [2.6| is a unique minimizer in the above expression. However, the above 
representation of the Hoeffding distance does not hold in the quantum case. Indeed, it 
was shown in [151 E3] that for density operators p and a, 

inf{S' (p II cr) : p is a density operator, S* (p || p) < r} = sup 

o<t<i 1 — t 

> Hr{p\\cr), 

where the inequality is due to the Golden- Thompson inequality (see, e.g.. Theorem 
IX. 3. 7 in [6]), and is in general strict. 

Although the Chernoff distance and the Hoeffding distances don't satisfy the axioms 
of a metric on the set of density operators (the Chernoff distance is symmetric but 
does not satisfy the triangle inequality, while the Hoeffding distances are not even 
symmetric), the Lemma below gives some motivation why they are called "distances". 

Lemma 2.10. If Tr A < 1 and Tr S < 1 then 

St{A\\B)>0, C{A\\B)>0, Hr{A\\B)>0 

for every t G (0, +oo) \ {1} and every r > 0. Moreover, the above inequalities are strict 
unless A = B and TrA = 1 or r > — ^/'(O) — ^'(0). 

Proof. Holder's inequality (see Corollary IV. 2. 6 in [6]) yields that Tr A^B^~* < (Tr y4)*(Tr 5' 
for every t G [0, 1], from which the assertions follow easily, taking into account the pre- 
vious Lemmas. □ 

2.2 Types 

Let be a finite set and let M.{X) denote the set of non-zero positive measures on X 
and M.i{X) the set of probability measures on X. We will identify positive measures 



i-t 
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with positive semidefinite operators as described in the previous subsection. For /i G 
Jli{X) let S{fi) := —J2xex ^^(^)^'^&^^(^) entropy, and for /Xi,/i2 G let 

the relative entropy of fii and be defined as S {fii \ \ fi2) '■= J2xex f^^i^) 

log gg] if 

supp < supp;U2, and +00 otherwise. 

For a sequence x G A"", the t?/j>e of x is the probability distribution given by 

T^{y) := -\{k : Xk = y}\, y&X, 
n 

where \H\ denotes the cardinality of a set H. Note that T^. = Ty if and only if x is a 
permutation of y. Obviously, if /x G then the measure of an x G X"' with respect 

to /i®"" only depends on the type of x, and one can easily see that 

In particular, 

T|"(a;) = e-"^(^^), and = T|"(x)e-"(^(^^ll^)). (13) 

A variant of the following bound can be found in [23]. For readers' convenience we 
provide a complete proof here. 

Lemma 2.11. Let x G and r := | suppT^^I. Then, 

ilogrr (to : T,^ TJ) > -^f^ + I (.og(y?7^) - 1/12) 

Proof. Let zi, . . . , z^., be an ordering of the elements of suppT^, and let ki := nTx{zi). 
Then 

\{y_ : T, = TJI = ^ , _ ^ , , T|'^(y) = n(A:./n)'=% T, = T^. 

By Stirling's formula (see, e.g., [E]), 

(m/e)'"y2^ 6^/(^2"^+^) < m! < (m/e)'"v/2^ei/^2m^ 

and hence, 



1=1 



Pn ■■= [{y_ : T, = Tj) = |{y : T, = Tj|T|"(a;) = ^ J] 

i=l 

:exp(l/(12n + 1) - l/12fci - . . . - l/12fc^). 



> _ 

i=l 

\/27rn 



271 yjki ■ . . . ■ kr 
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Using ^ki- K < = ^, we have Vh- ■■■■K < {n/ry/^, while h > 1 yields 

1/ki + . . . + 1/kr < r, and hence, 

p„ > ^^{r/ny/^ exp ( — ^ - ^] > ( V^)^ni/2-/2 exp ^ ^ 



/2^' ' ' ' "Vl2n + 1 127 - ' "Vl2n+1 12 
which yields 

i logP„ > + ^ (log(v^) - 1/12) + — i— . □ 

n 2 n n \ / n(12n + 1) 

Let Tn denote the collection of all types arising from length n sequences, i.e., Tn '■ = 
{Tx : X G X'^}. It is known that UnenTn is dense in Aii{X), and inf^^-j-^ ||/i — < '-^ 
for any /i G see, e.g., [H]. Moreover, the following has been shown in Lemma 

A.2 of [23J: 

Lemma 2.12. Let v G M.'^ and c G M, and assume that the half-spaces Hi := {/ G 

M"^ : ^xex fi^)''^i^) < ^'^'i -^2 := {/ € 1^'^ : SxeA" /(^)'^(^) > "^l have non-trivial 
intersections with A4i{X). Then for every /i G A^i(A:') such that ^^ex ~ 
and every n > r{r — 1), where r := | supp/i|, there exist types & Hi H Tn and 
fi2 ^ H2 n Tn such that 

2(r - 1) 



max 



n 

For more about types and their applications in information theory, see e.g., [10]. 

3 Optimal Type II errors 

Consider the state discrimination problem described in the Introduction. In this section, 
we will give bounds on the error probabilities Pn,£ and Pn,e-"^- The key technical tool 
will be the following lemma about the duality of linear programming, known as Slater's 
condition; for a proof, see Problem 4 in Section 7.2 of [3]. 

Lemma 3.1. Let Vi and V2 be real inner product spaces and let Ki be a convex cone 
in Vi,. The dual cone K* is defined as K* := {y E Vi : {y,x) > 0, x G Ki}. Let 
c G ^1, 6 G V2 and let A : Vi — )■ V2 be a linear map. Assume that there exists a f in 
the interior of Ki such that Av — 6 is in the interior of K2. Then the following two 
quantities are equal: 

'jP : = inf{(c,w) : v >k^ 0, Av >k^ 6}, 
7"^ : = sup{(6,w) : w >k* 0, A*w <ki c}. 



Using Lemma 3J^, we can give the following alternative characterization of the op- 
timal type II error: 
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Proposition 3.2. For every e G (0, 1), we have 

/3i,, = sup{(l - e)\ - Tr(Ap - = sup - \ \\\p - - Xe\ (14) 

A>0 A>0 1^ ^ ^ J 

< sup{A*TrpV^"* - Ae}, te[0, 1]. (15) 

A>0 

Moreover, for every n G N and every t G [0, 1), 

^1 « ^ c / II ^ , log^"^ ^ 1 ^2(t) 

-log^„,e < -5t p a + 7--^ 7, 16 

n n 1 — t nl — t 

where /i2(t) := -tlogt- (1 - t) log(l - t), t G [0,1]. 

Proof. Let p and a be density operators on some finite-dimensional Hilbert space, and 
for each e > define 

/3e := min{Tr ctT : < T < /, Tr p(/ - T) < e}, 

which is the optimal type II error for discriminating between p and a under the con- 
straint that the type I error doesn't exceed e. We apply Lemma 3.1 to give an alternative 
expression for (3^- To this end, we define 

V,:=B{H)sa, c:=a, V2:=B{n)sa(BR, b := -I (B {1 - e), 

where B{l-L)sa is the real linear vector space of self- adjoint operators on l-i. We equip 
both Vi and V2 with the Hilbert-Schmidt inner product, and define Ki and K2 to 
be the self-dual cones of the positive semidefinite operators. If we define A to be 
A: X ^ -X © Tr pX then A* is given by A* : X © A f-> -X + Ap, and we see that 
7^ = It is easy to verify that the condition of Lemma 3J_ is satisfied in this case, 
and hence 

/3^ = 7f = 7'^ = sup{-TrX + A(l-£) : X > 0, A > 0, -X + Ap < a}. 
For a fixed A > 0, we have 

inf{TrX : X > 0, Ap-a < X} = Tr(Ap-a)+ = ^ Tr(Ap-a) + ^ ||Ap -d\\, = V 
(the first identity can also be seen by a duality argument). Hence, we have 



d 



sup{(l - e)\ - Tr(Ap - ct) + } = sup | - ^ ||Ap - (t||^ - Xe\ 

A>0 A>0 1^ / / J 



< sup{A*Trp*a'"* - Ae}, tG[0,l], 

A>0 
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where the last inequahty is due to Lemma 2.5, Choosing p = p and a = a gives (14) 
and (15). 

Note that /(A) := A*Trp*5"^~* — \e is concave, and hence if /(A) has a stationary 
point A* then this is automatically a global maximum. Solving /'(A*) = in the case 
t 7^ 1, we get 



A* 



tTip'a 



and substituting it back, we get 

t log e — log Tr p^a^~* 



log/3, <log/(A* 



l-t 



h2{t) 
l-f 



t G [0,1). 



Choosing now p = p®", cr = a®", we obtain 

1, ^ / (t/n)loge-logTrpV-* 1 /i2(t) 

-log/3n,e < T> 

n I — t nl — t 



t G [0,1), 



which is equivalent to (16). 



Theorem 3.3. For every e G (0, 1) and every n G N, we have 
- log Pn,e <-S{p\\cr) + ^4^2 log e-' log t] - 



n 
1 
n 



log/3„,e >-S{p\\a)- ^4y21og(l - e)-^ logr/ 



n 



(17) 

□ 

(18) 
(19) 



where i] := 1 + e2'^3/2{p||o") 2'5'i/2(plk)^ as in ([o]). Moreover, for every n G N and every 
r > — log Tr po"° we have 



1 rr / II \ i h2(t 

-logP„,e— < -^/r (P 0"; 

n 



where tr '■= argmax, 



0<t<l 



77, 1 — tr- ' 
tr-logTr p'o-i-* 



(20) 



and tr > 



Proof. The upper bound (16) with the choice e = e yields 
^ -tr — log Tr p^a^^^ 



log < 



1 h^it) 



n ' 1 — t nl — t 

If r > — logTr pcr° then there exists a G [0, 1) such that 



tG [0,1). 



(21) 



—rtr — logTr p*' cr^ 



- .x-p-a^ - -rt-logTrpV^ u / ii n 
= max = Hr(p\\a) . 

1-tr 0<t<l l-t rKt^u J 
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This follows from Lemma 2.6 when r < — ^^(O) — ip'^O), where iplt) := logTrpV*, and 
for r > -^(0) - ^'(0) we have U = 0. With this U, @ yields 



Next, we apply Lemma 2.3 with A := p and i? := o" to the upper bound (|16j) to get 

1 h2it) 



1 



n 



log/3„,e <-S{p\\a) + (1 -t)4(coshc)(logr7)^ + 
< -5 (p II (j) + (1 - t)4(coshc)(logr/)2 + 



— log e t 

n 1 — t nl — t 
hge 1 2 log 2 



n 1-t 



n 



which is valid for 1 — 6 < t < 1. Now let us choose t = 1 — aj ^fn for some a > 0; then 
we have 



1 



n 



log &n,e < -5 (P I k) + ^4(cOsh c) (log r]f + 



-logel 21og2 



n 



n a 



n 



and optimizing over a yields 

-log/3„,e < -S{p\\a) + a/ 4(cosh c) (log 7]Y log e'^ - ^1^. 



(22) 



where the optimum is reached at a* 



4(coshe)fiogr;)2 . The above upper bound is valid 



as long as 1 — a* / y/n > 1 — 5, or equivalent ly, 

*^2 \oge~^ 



n > A{a*y 



and n> 4{a*y {log Tj^/c^ 



\oge ^ 



(23) 



(cosh c) (log 77) 2 ^ ; V o /y / cOsh C 

Let us now choose c such that coshc = 21oge~^. Then it is easy to see that c = 

arcosh(21og£:~^) = log ^21og£:"^ + a/ (2 loge'^y — 1 j > 1. Since we also have log// > 

1, we see that both of the lower bounds in (23) are less than 1, i.e., the upper bound in 
(22) is valid for all n G N with coshc = 21oge~^, which yields (18). 

To prove (19), we apply the idea of [29j to use the monotonicity of the Renyi relative 
entropies to get a lower bound on (3n,e- Let T be any test such that an(T) = Tr pn{I — 
T) < e; then for every t G (1,2] we have 

Trpial,-' > (Trp„T)*(Tra„T)i-* + (Trp„(/-T))*(Tra„(/-T))i-* 
> (Trp„T)*(Tr(T„T)i-* > (1 - £)*(Tr a„T)i-*. 

Taking the logarithm and rearranging then yields 



log Tr a„T > -St {pn 1 1 o"™) - 



t- 1 



log(l — e 



,-1 



Taking now the infimum over all T such that aniT) < e, and using Lemma 2.3 
obtain 



we 



-log f3n,e > --St (Pn II 0-n) " --^log(l - s) ^ 



n 



n 



nt-1 



> -S (p 1 1 (t) - (4 cosh c) (t - 1) (log riY 



1 1 



log(l - e) 
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Again, let t := 1 + a/ ^Jn\ then 



n 



log/3„,e>-5'(p||a) 



a(4 cosh c) (log r/)^ + — 



and optimizing over a yields 
1 



n 



log/3„,e > -S{p\\a) - V(4 cosh c) (log r/)2 log(l - e) 

\ n 



-1 



where the optimum is reached at a* 
l<t<l + 5, or equivalent ly, if 

lofid-e)-! 



log(l-£)-l _ 

4(cosh c)(logr))' 



This bound is valid as long as 



*\2 



n > 4(a*) 



and n > 4(a*)^(log?7)^/c^ 



(cosh c) (log 77)^ 



log(l — 
(? cosh c 



,-1 



Choosing c = arcosh(2 log(l — e) ^\ the same argument as above leads to (19). □ 



Remark 3.4. The bounds in (18) and (19) yield immediately the quantum Stein's 



lemma, i.e.. Theorem 1.2 



Remark 3.5. For any chosen pair of states p and cr, the set of points {(q;(T), /3(T)) : 
T test} forms a convex set, which we call the error set here, and the lower boundary 
of this set is what constitutes the sought-after optimal errors. It is easy to see that for 
any e G (0, 1), /3e := min{Tr crT : Tr p(l — T) < £:} can be attained at a test for which 
Tr pil — T)=e. It is also easy to see that there exists a > and a Neyman-Pearson 
test such that {XeP — cr > 0} <T^ < {XeP — cr > 0}, for which Tr p{I — T^) = e, and 
by the Neyman-Pearson lemma (see the Introduction), /?£ = Tr aT^. That is, all points 
on the lower boundary can be attained by Neyman-Pearson tests. Finally, we have the 
identity Tr aT^ = Tr pT^ — Tr(A£p — (t)+ = Ae(l — e) — TT{XeP — o-)+; cf. formula (14). 



Here, A is related to the slope of the tangent hne of the lower boundary at the point 
{a{T^), /3{T^)). In the next section we follow a different approach to scale the lower 
boundary of the error set by essentially fixing the slope of the tangent line and looking 
for the optimal errors corresponding to that slope; this is reached by minimizing the 
mixed error probabilities e~""an(T) + f3n{T). 



4 The mixed error probabilities 

Consider again the state discrimination problem described in the Introduction. For 
every a G M, let e„(a) be the mixed error probability as defined in (|2]), and let ip{a) 
and (p{a) be as in (|5|. Note that e„(0) is twice the Chernoff error with equal priors 
p = 1 —p = 1/2, and for every r > — logTrpa^, we have ip{ar) = Hri^pWcr) and 



0{ar) = r for := Hr {p\ \ cr) — r, due to Lemma 2.6 



Lemma 2.5 yields various upper bounds on the error probabilities. These have 



already been obtained in [21 |2T1 [30]. We repeat them here for completeness. 
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Proposition 4.1. For every a G M and every n G N, we have 

- log en{a) < -ip{a), (24) 
n 

which in turn yields 

-\ogan{T)<-0(a), -log/3„(T) < -(^(a) (25) 

n n 

for every T G A/'n,a- In particular, we have 
1 



loge„(0) <-C{p\\a) 
n 

for the Chernoff error, and if r > — log Tr pa^ then we have 

— log Cniar) < —Hr(p\\cr) , and 
n 

-loga„(T) < -^{ar) = -r, -log/3„(T) < -(p{ar) = -Hr{p\\(T) 

n n 

for every T G Mn,ar, where Qr = Hr (p 1 1 cr) — r. 

Proof. For fixed a G M and n G N let T G Afn,a- Then we have 

1 I -na 1 

e„(a) = e-""a„(T) + /3„(T) = —- ||e-"V„ - < e""*" Tr p^a^* 



(26) 



for every t G [0,1], where the inequality is due to Lemma 2.5 Since Trp^cr^"* = 
(Tr p*(T^~*)", taking the infimum over t G [0,1] in (26) yields (24). The inequalities 
in (25) are immediate from e~"''^a„(T) < e„(a) and /3„(T) < e„(a). The rest of the 
assertions follow as special cases. □ 

To obtain lower bounds on the mixed error probabilities, we will use the mapping 
described in the beginning of Section [2] with A := p and B := cr. Hence, we use the 
notation X := Xp^„, p := pp^^j and q := qp^cr- Note that suppp = suppg = X and 
p{X) < 1, q{X) < 1. For every a G M and n G N, let 

e„(a) := min{e-">®"(A'" \ T) + g®"(T) : T C A""}. 

It is easy to see that 

~en{a) = e-"V"(A'" \ iV,,,) + g^"(iV„,,), (27) 

where 

iV.,,:= XGA"^ : -log^^^>a 
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is a classical Neyman-Pearson test for discriminating between p and q. One can easily 
verify that N^^a = {x G A"" : T^eAfa}, where 



Afa = {l^eMiiX) : S ifi\\q)-S ifi\\p) > a} = K e MiiX) : 5^/i(y)log 



is the intersection of with the half-space {/ G M : f{y)v{y) > a}, where 

q{y) '■ 



V is the normal vector v{y) := log 44, y ^ X. We also define dNa := {fi E M.i{X) : 



S i^^Wq) - S {fi\\p) = a}. 

The following Lemma has been shown in [32] (see also Theorem 3.1 in [21j for a 
slightly different proof): 

Lemma 4.2. For every a G M and n G N, we have 2e„(a) > e„(a). 

Hence, in order to give lower bounds on the mixed error probabilities en(a), it is 
enough to find lower bounds on e„(a). Let X°° := x^^A' be equipped with the sigma- 
field generated by the cylinder sets, and let Yf^(x) := log x E k E N. Then 
Yi,Y2, . . ., is a sequence of i.i.d. random variables on with respect to any product 
measure. By (27), we have 

e„(a) = e""''a„(a) + /3„(a), 

where an(a) := p®"(A:'" \ Nn^a) and /3„(a) := q'^"-{Nn,a), or equivalently. 




ania)=p^''\-y Yk<a], /3n(a) = 



Note that with p := p/p(A:') and q := ^'/^(A:'), we have 

EpY^ = Sip\\q) /p{X) = ^'(1), Y, = -S{q\\ p) /q{X) = ^'(0). 

Hence, by the theory of large deviations, ari(ci) and (3n{ci) decay exponentially fast in 
n when ip'{0) < a < ip'i^)- Using Theorem 1 in |i3|, we can obtain more detailed 
information about the speed of decay: 

Proposition 4.3. For every ip'i^) < a < "^'(1), there exist constants ci,C2,di,d2, 
depending on p, a and a, such that for every n G N, 

llogn ci 1 ~ / llogn C2 , . 

-VW - o + — < - logan(a) < -V5(a) - 7: + — , (28) 

2 n n n Inn 

llogn rfi 1 3 / N ^ ( ^ l^ogn d2 , . 

-^[a) - + — < - log/3„(a) < -(^(a) - + — . (29) 

1 n n n Inn 

Proof. Note that the moment generating function of Yi with respect to q is M{t) : = 
Eg (e*^0 = Exga^pW^^W^^V^I-^). and hence inffgM e-*"M(t) = e-^('^)-i°s'?('^) =: 
The bounds in (29) then follow immediately from Theorem 1 in [1], and the bounds in 
(28) can be proven exactly the same way. □ 
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Remark 4.4. It is easy to see that ip'{0) < a < if and only if there exists an 



r such that < r < —ip{0) — ip'{0) and a = a^. Hence, Proposition 4.3 can be 

reformulated in the following way: For every < r < — '?A'(0), there exist 

constants 71, 72, ^i, ^2, depending on p, a and r, such that for every n G N, 

1 logra , 7i ^ 1 1 ~ ^ 1 log^ , 72 
-r - \ < -loga„,r < -r - \ , 

2 n n n Inn 

llogn 61 I ~ llogn 62 

-Hr - h — < - \0gl3n,r < -H,. - h — , 

2 n n n 2 n n 

where an,r ■= an(ar), (3n,r ■= Pnidr)- 

Corollary 4.5. For every ip'{0) < a < ^/''(l), there exists a constant c, depending on 
p, a and a, such that for every n G N, 

1 , . X . , X llogn c 

-logen(a) > -V2(a) - h -. 

n 2 n n 

In particular, if ?/''(0) < < ip'{l) then 

1 , / N ^ / , , N 1 log n c 
-loge„ > -Cip a)~-^ + -. 
n 2 n n 

Equivalently, for every —^(1) < r < —ip{0) — ip'{0), there exists a constant 7, 
depending on p, a and r, such that for every n G N, 

-loge„(aJ > -Hr - h -. 

n 2 n n 



Proof. Immediate from Lemma [4. 2[ Proposition 4.3 and Remark 4.4 



□ 



Proposition |4.3| and Remark |4.4| show the following: In the classical case, the leading 
term in the deviation of the logarithm of the type I a nd type II errors from their 
asymptotic values are exactly 



1 logn 
'2 n 



Using Lemma 



4.2 



we can obtain lower bounds 



on the mixed error probabilities in the quantum case with the same leading term, as 



shown in Corollary 4.5 Unfortunately, this method does not make it possible to obtain 



upper bounds on the mixed quantum errors, or bounds on the individual quantum 
errors. Another drawback of the above bounds is that the constants in the 1/n term 
depend on a (or r) in a very complicated way, and hence it is difficult to see whether 
for small n it is actually the term or the 1/n term that dominates the deviation. 
Below we give similar lower bounds on the classical type I and type II errors, and hence 
also on the mixed quantum errors, where all constants are parameter-independent and 
easy to evaluate, on the expense of increasing the constant before the term. To 
reduce redundancy, we formulate the bounds only for and f3ny, the corresponding 
bounds for Q;„(a) and /3n(o) follow by an obvious reformulation. 
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Proposition 4.6. For every < r < -ip{0) - V^'(O) and n > \X\{\X\ - 1), 



log an,r > -r 



3(|A'|-l)logn c„ 



+ 



2 n n n{12n + 1) ' 

^^ R -> n 3(|A'|-l) logn 4 , 1 

- log/3„,r > -Hr + — — — : 

n Inn n[12n + 1) 



(30) 
(31) 



where Cn in ( 30 ) can be upper bounded as 

c„< (|A'|-l)(l + logp^?J + 1.3, 
and for large enough n, 

c„ = i\X\- 1)(1 + logp^fj - \X\ (log V\X\/27r - 1/12 



where Pmin := min^g;^ {p(a;)}. The same statements hold for dn in (31), with p„ 
replaced with q^in ■= ^i^xexilix)}. 



Proof. The proofs of (30) and (31) go exactly the same way; below we prove (31). Let 
tr be as in Lemma 2.6 By Lemma 2.6 , we have S (/i*"" \ \q) — S (/i*"" \ \p) = — r = ar, 
and hence /i*'" G dAfa^. For a fixed r and n > r(r — 1), let x G A"" be a sequence such 
that 



1 , p®"(x 

Or < — l0~ 



p{y) 



and \\fi'^-T,\\^< 



2(1^1-1) 



n 



The existence of such a sequence is guaranteed by Lemma 2.12 Obviously, G A/'a^. 
By dTl, 



Using then Lemma |2.11 
1 



log/3„,r- > -5'(r^ II g) 
n "2 



Sec - 1 log , 



+ ^ lo; 



n n 



St. 1 




+ 



2n 12 J n(12n + l) 



(32) 



where Sx '■= \ suppT^^I. By Lemma 2.6, Hr = 5' (yU*'' || q), and using Lemma 2.4 yields, 
with k := l^^l - 1, 

\S{Tx\\q)-Hr\ = \S{Tx\\q)-S{^^'^\\q)\ 



< (k/n) log A; + h2{k/n) — {2k/n) log gn 
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Note that ri{x) := —xliax is concave, and hence ri{x) < 77(1) + ri'{l){x — 1) = 1 — x, 
which in turn yields 

. .. k ( / ^\ ^1 ^1 7 ^ 

h2[k n) = log 1 log 1 < — logn logfc H — , 

n n \ n J \ n J n n n 

and hence, 

-S (T^ll g) > -Hr - {k/n)\ogk- h2{k/n) + (2/c/n) log gmin 

Aj Aj Aj 

> —Hr — (kin) log k logn H — log k h (2k/n) log gmin 

n n n 

k 

= -Hr logn h {2k /n) loggmin- 

n n 



Finally, combining the above lower bound with (32), we obtain 



1, n rr 3(\X\-l)\ogn c 1 
log/3,,. > -Hr - ^4 -— + 



n "'' ' 2 n n ' n(12n + l)' 

where 



c = {\X\ - 1)(1 + logg^fj - \s^\ (^log ^sj2n - 1/12 j . 

It is easy to see that the lowest value of f{n) := n ^log(-^n/ (27r)) — 1/12 j , n G N, 

is at n = 2, and is lower bounded by —1.3. Moreover, for large enough n, suppT^ = 
supp/x*'' = X, which yields the statements about c„. □ 



Combining Proposition 4.6 with Lemma 4.2, we obtain the following lower bounds 



on the quantum mixed error probabilities: 

Theorem 4.7. Let d be the dimension of the subspace on which p and a are supported. 
For every —iIj{1) < r < —ip{0) — ip'i^O) and n > cP{cP — 1), we have 

I1 / 7T ^ II ^ 3(rf2-l)logn c 1 

-loge„ a, > p a + ..^ . 33 

n 2 n n n[12n + 1) 

where c is a constant depending only on p and a. 

If, moreover, there exists a t G (0, 1) such that 'ip'it) = then 

1 , , , M X 3(rf2 - l)logn c 1 
loge„(0) > -Cip\\a)-^- '-^ + 



n 2 n n n(12n + 1) 



Proof. The inequality in (33) is immediate from Lemma 4.2 and Proposition 4.6 by 
taking into account that | suppp U suppg|} < (P. This bound applies to the Chernoff 
error, i.e., the case a = 0, if = a. = 'ip'itr) for some — '?/'(l) < r < -ip^O) — ip'{0), 
which is equivalent to the existence of a t G (0, 1) such that ip'it) = 0. □ 
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Remark 4.8. By the bound given in Proposition 4^, the constant c in Theorem 4/7 
can be upper bounded as 



c<{d^ - 1)(1 - 21ogmin{pmin,gmin}) + 1-3, 

where 

Pmin := min{Ai Tr PiQj : TiPiQj > 0}, gmin := mm{'r]j Ti PiQj : Ti PiQj > 0}, 
and p = J2i \Pii ^ = X^j VjQj the spectral decompositions of p and a, respectively. 



5 Closing remarks 

In this paper we studied the finite-size behaviour of various error probabilities related 
to binary state discrimination. In the classical case, the error probabilities a„(a) and 
/3„(a), corresponding to the Neyman-Pearson tests, can be written as large deviation 
probabilities, and their exponential decay rate is given by Cramer's large deviation 
theorem [11]. If Pn{a) denotes an{a), f3n{a), or the mixed error probability e„(a), for 
some a G M, then the upper bound of Cramer's large deviation theorem tells that 
Pni^i) ^ e""^*-"-*, where I (a) > for the relevant values of a. The more refined large 
deviation theorem of Bahadur and Rao jl] yields a faster decay, of the form 

Pn{a) < ^e-"^('^), (34) 



n 



where C{a) is a constant (depending on a but not on n). Moreover, it shows that 
this bound is optimal in the sense that there exists another constant c(a) such that 
^g-n7(a) ^ p„(a). (See also |36] for an upper bound on the constant C(a), and [9] for 
an extension to correlated random variables.) By mapping the quantum problem into 
a classical one, using the method of Nussbaum and Szkola [32J, one can easily obtain a 
lower bound on the mixed error probability e„(a) of the form si^Q-ni{a) ^ e„(a), as given 



in Corollary 4^ Unfortunately, with this method it is only possible to obtain a lower 
bound, and only on the mixed error probabilities e„(a), and not on the individual error 
probabilities a„(a) and /3n(a). It shows nevertheless that it is not possible to obtain 
a faster decay of the mixed error probabilities in the quantum than in the classical 
case. On the other hand, it remains an open problem whether the optimal decay 
rate can be attained by using only separable measurements. A different approach to 
refining Cramer's theorem was developed by Hoeffding |23j , using the method of types. 
Although this method yields a somewhat looser lower bound, its advantage is that the 
constants can be easily bounded by simple expressions that are independent of a; see 



Theorem 4/7 and Remark 4^ for the quantum versions. 

Unlike for the above error probabilities, it is not clear whether the optimal error 
probabilities Pn,e of Stein's lemma and Pri,e-"^ of the Hoeffding bound can be written as 
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large deviation probabilities for some sequence of random variables. In section |3| we used 
a lin ear programming approach to obtain bounds on these error probabilities. Theorem 
3.3 shows that /5„,e— < C(r)e-"-^^(''ll'") for some constant C{r) < 1 which can also be 
easily evaluated. This bound is clearly not optimal in the classical case, as Pn,e-"^ ^ 
/3n(ar), and the latter can be upper bounded in the form /3„(ar) < 



C{ar) ^-nHripWa) 

(cf. Proposition 4.1 and Remark 4.4). However, at the moment the bound of Theorem 



|3.3 seems to be the best available one for the quantum case. 

To the best of our knowledge, the most detailed information about the asymptotics 
of ^ri,e so far (even in the classical case) was that lim„_>.oo - log/3„^e = —S (p || u). Our 



bounds in Theorem 3.3 give more detailed information, namely that the deviation of 
the error rate ^ log from its limit —S (p 1 1 cr) is at most the order of 1/ y/n, i.e.. 



< -log/3„ 
n n 



S{p\\<y)< 



9{e) 



n 



(35) 



where 



/(£)=4v^logr/log(l 



g{e) = 4-\/21og?7log£' 



Note that here f{e) > and g{e) > for every e G (0, 1). Two questions arise naturally 
related to the bounds in (35). The first is whether l/\/n is the true order of the 
deviation. Indeed, it could be possible that the convergence of -log/3„^£ to —S{p\ 



is actually much faster, but still compatible with the bounds in (35). The second is 
whether the upper bound could be improved by replacing g{e), which is strictly positive 
for every e G (0,1), with some negative function h{e). Indeed, note that the upper 



bound in (35) can be written in the form 



-nS(p||o-)gC/(e)v^ 



i.e., the correction to the exponentially decaying term goes to +oo as n — )• +oo, whereas 



in (34) we obtained a monotonically decaying correction that vanishes asymptotically. 



The answers to both of these questions can be extracted from the recent paper ^26j, as 
we show below. 

Theorem 3 in [26] says that for given (non-identical) states p and cr with supp p < 
supp a, every E2 G M, and every sequence of measurements {T^, J„ — T„}„gN, if 



lim sup ^/n ( log/3„(T„) + 5 (p || a) ) < -E-? 



(36) 



then 



liminfa„(T„) > $ 



E,, 



cr 
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where V(p||cr) := Tr p (logp — log cr)^ — S{p\\a)'^, and = ^'^ dt is the 

cumulative distribution function of the standard normal distribution. Moreover, there 
exists a sequence of measurements {T„, J„ — Tn\nm such that (36) holds, and 

( E2 



lim a„(T„) = $ 

n— >+oo 



Consider now all sequences of measurements {T„, /„ — Tn}rvm such that £({T„}) : = 
lim„_j.+oo OiniTn) cxists, and for all such measurements, let 

E2{{Tn}) ■■= -limsup ^/n ( - log/3„(T„) + ^(p || a) ) . 

The above mentioned results of [26] yield that 

mTn}) < VvW)^-\e{{T^})), (37) 

where the upper bound is sharp. Let e G (0, 1) and for every n G N, let T„_e be a 
measurement such that f3n,e = (3n{Tn^e)- It is easy to see that we can choose Tn^e such 
that it also satisfies an(Tn,e) = in particular, limn_j.+oo Q;„(T„^e) = e. It is also easy to 
see, from the definition of /3„,e and some simple continuity argument, that 



- lim sup I - log /3n,e + S {p\\a) \ > - lim sup Vn I - log /3„(T„) + S {p\\a] 

n—^+oo \n / n—^+oo \n 

for any sequence of measurements {T„, /„ — T„} such that e{{Tn}) = e. Taking into 
account the sharpness of the bound in (37), we obtain that 

limsupv^(-log/3„,, + 5(p||(T) ) = -^/V{f^^-\e). (38) 

This shows that the correct order of the deviation of ^ log [5n,e from —S (p 1 1 cr) is indeed 
^1 \fn (at least for e 7^ 1/2, since then ^~^[e) 7^ 0). From this we can also conclude 
that fin^s cannot be written as a large deviation probability for the ergodic average of 
a sequence of i.i.d. random variables, since then the order of the deviation would be 



1 logw 
' 2 n 



, according to the Bahadur- Rao bound 



Moreover, (38) yields that for any e' G (e, 1) there exist infinitely many n G N such 
that 



n 

or equivalent ly, 



-log/3„,, + ^(p||a)') >~^VW)^-\e'), 
n J 



i.og,„>-s(pii.) + =v»t:V). (38) 

n \ n 



In particular, if e < e' < 1/2 then — a/V^(p||(t)$ ^(e') > 0, and (39) shows that it is 
not possible to have an upper bound as in (35) with some h{e) < in place of g{e) for 
£G (0,1/2). 
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Appendix: Binary Classical Case 

In this Appendix we treat the problem of finding sharp upper and lower bounds on the 
error probability of discriminating between two binary random variables (r.v.). One has 
a distribution {p, 1 — p), and the other (g, 1 — q), with < p, g < 1. We assume that 
both r.v.'s have the same prior probability, namely 1/2. We consider the mixed error 
probabihty e„(a) for a Neyman-Pearson test (governed by the parameter a) applied to 
n identically distributed independent copies of the r.v.'s. This error probability is given 

by 

= ^ |j - P)""'' ^'(1 - ^)"'') • (40) 

In the limit of large n, this error probability goes to zero exponentially fast, and the 
rate — (loge„(a))/?T, tends to (p{a) defined as 

<^(a) = sup {at - ^(t)}, m = log(p*gi-* + (1 - p)*(l - g)^-*). (41) 

o<t<i 

From this function we can derive the Hoeffding distance between the two distributions: 



-rt - ihit) , , 

= sup ^ y (42) 

Here we are interested in the finite n behaviour of e„, namely at what rate does 
— (loge„)/n itself tend to its limit. Because we are dealing with binary r.v.'s, e„ is 
governed by two binomial distributions. Let Pfc,n = (^)p'^(l— p)"~'^ and Qfc.n = — 
q^n-k^ By writing the binomial coefficient in terms of gamma functions, rather than 
factorials, the values of these distributions can be calculated for non-integer k (even 
though these values have no immediate statistical meaning). We can then solve the 



equation e ^""Pk^n = Qk,n for k and get the point where one term in (40) becomes 
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bigger than the second. Let k = sn he that point. Assuming that p < q we can then 
rewrite (40) as 

^n{a) = l(j:h<l\l-<ir-'+ i: Qe-V(l-prM. (43) 

yfc=0 ^ ^ k=l+[sn\ ^ ^ J 

The value of s is the solution of the equation 

which is equivalent to 

s logj9 + (1 — s) log(l — p) — a = slog g + (1 — s) log(l — g) 
hence s is given by 



log - a 

Alternatively, s(a) is the value of s that minimises (43). 

The summations in (43) can be replaced by an integral, each giving rise to a reg- 
ularised incomplete beta function, using the formula for the cumulative distribution 
function (CDF) of the binomial distribution 

E U /(I -P)""' = - ko,ko + 1). (45) 

The regularised incomplete beta function Iz{k,l) is defined as 

B{z,k,i) _ j-dtt'-\i-ty-^ 
^^'^ B{k,i) j^dtf^-^ii-ty-^- 

We thus get 

enia) = ^ {h^,{n - [sn\ , [sn\ + 1) + e-"'^(l - /i_p(n - [sn\ , [sn\ + 1))) . (46) 

Because e„(a) is just a summation with summation bounds depending on ra, as witnessed 
by the floor function appearing here, e„(a) is a non-smooth function of n. To wit, as a 
function of n, e„(a) exhibits a wave-like pattern, and so does its rate — log(e„(a))/?T,, as 
shown in Fig. [T} The amplitude and period of these waves increases when p becomes 
extremely small. In order to obtain nice bounds on e„(a), we will first try and remove 
the wave patterns by removing the floor function from e„(a) in a suitable way. More 
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precisely, we look for upper and lower bounds on en(a) that are as close to en(a) as 
possible. 

The complete and incomplete beta functions have certain monotonicity properties. 
Since for t between and 1, decreases with k, B{z, k, I) decreases with k and with 
/. Thus, we immediately get the bounds 

B{z, n — sn + l,sn + 1) < B{z, n — \_sn\ , [sn\ + 1) < B{z, n — sra, sn). (47) 

For the regularised incomplete beta function this means 

Biz,n — sn + \, sn + \) i i i i n B(z,n — sn,sn) 

<Iz[n- [sn\ , [sn\ + 1) < 



B{n — sn, sn) B [n — sn + 1, sn + 1) 

Using the relation B(k + !,/ + !) = ■jj^^:fj^^:i^B{k, I), this yields 

ns(l — s) / I I I I \ ('^ + 1) / 

Iz{n — sn + 1, sn + 1) < LAn — \ sn\ , \ sn \ + 1) < — -, r hin — sn, sn). 

n + 1 ns[l — s) 

Sharper bounds are obtained by using a monotonicity relation applicable for the 
specific arguments appearing here. Because of relation (45), we see that Iz{n — x, x) is 
monotonously increasing in x when x is restricted to be an integer between 1 and n. It 
is therefore a reasonable conjecture that it increases monotonously over all real x such 
that < X < n. 

Lemma A.l. Let < 2; < 1. The function x t— Iz{n — x, x) is monotonously increasing 
in X for < x < n. 

Proof. The derivative w.r.t. x is non-negative provided 

B(n — X, x)—B(z, n — x,x) — Biz, n — x, x)—B(n — x,x) > 0. 
ax ax 

holds. The derivative of B{z, n — x,x) is given by 

^B{z,n-x,x) = [ c/t log((l -t)/t)r-"-^(l -t)"-\ 
ax Jq 

Therefore, the derivative of Iz{n — x, x) is non-negative if 

[ dMM'^-^-l(l-M)^-^ /" rft log((l -t)/t)t"-^-^(l - 

Jo Jo 

[ c/mm"-^-^(i -m)^-^ /" rft iog((i -t)/t)r-^-^(i > 0. 

Jo Jo 
As both terms have the integral over the area < t,u < z in common, the integrals 
simplify to 

[ (iMM"-^-l(l-M)^-l /" rft log((l -t)/t)t"-^^^(l - 
Jz Jo 

f dMM"-^-^(l-M)^-^ /" rft log((l-t)/t)t"-^-^(l-t)^-^ 
Jo Jz 
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Upon swapping the variables u and t in the second term, this can be rewritten as 
•1 



/ 

J z 



du u 



n—x—l I 



1-U 



dt log((l - t)/t)t 



n—x—l I 



dte-''~\i-tf-^ I du\og{{i-u)/u)u''-''-Hi-uY-^ 



which simphfies to 



du f dt {\og{{l - t) / 1) -\og{{l-u) /u))u 

z Jo 



n—x—l I 



1 - 



{i-ty-\ 



Since the integral is over a region where u >t, and log((l — t)/t) — log((l — u)/u) > 
for u>t, the integral is indeed non-negative. □ 
Using the lemma, we then get 

Iz{n — sn + 1, sn) < Iz{n — [snj , [snj +1) < Iz{n — sn, sn + 1). 

This yields upper and lower bounds on en{a) given by 

e„(a) > (/i_g(n(l-s) + l,ns) + e-"%(ns + l,n(l-s)))/2 (48) 
e„(a) < (/i_g(n(l-s),ns + l) + e-"%(ns,n(l-s) + l))/2. (49) 

Here we have used the relation 1 — Iz{a, b) = Ii-z{b, a). Numerical computation shows 
that the large-n behaviour of these bounds are consistent with the predictions of Propo- 



sition 4.3 Two concrete examples are depicted below. 





Figure 1: Graph of the error rate function n h> — log(e„(0))/n, together with lower and 
upper bounds. Starting from below we have the Chernoff bound (the constant), the 



lower bound (49), the exact error rate (the oscillating line), and the upper bound (48); 
the two cases considered are (a) for p = 0.001 and q = 0.5, and (b) for p = 10~^° and 
q = 0.5. 
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